AITopics | combinatorial action

Collaborating Authors

combinatorial action

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing

Neural Information Processing SystemsDec-23-2025, 17:32:30 GMT

Value-function-based methods have long played an important role in reinforcement learning. However, finding the best next action given a value function of arbitrary complexity is nontrivial when the action space is too large for enumeration. We develop a framework for value-function-based deep reinforcement learning with a combinatorial action space, in which the action selection problem is explicitly formulated as a mixed-integer optimization problem. As a motivating example, we present an application of this framework to the capacitated vehicle routing problem (CVRP), a combinatorial optimization problem in which a set of locations must be covered by a single vehicle with limited capacity. On each instance, we model an action as the construction of a single route, and consider a deterministic policy which is improved through a simple policy iteration algorithm. Our approach is competitive with other reinforcement learning methods and achieves an average gap of 1.7% with state-of-the-art OR methods on standard library instances of medium size.

application, combinatorial action, reinforcement learning, (6 more...)

Neural Information Processing Systems

Industry: Transportation > Freight & Logistics Services (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement learning with combinatorial actions for coupled restless bandits

Xu, Lily, Wilder, Bryan, Khalil, Elias B., Tambe, Milind

arXiv.org Artificial IntelligenceMar-17-2025

Reinforcement learning (RL) has increasingly been applied to solve real-world planning problems, with progress in handling large state spaces and time horizons. However, a key bottleneck in many domains is that RL methods cannot accommodate large, combinatorially structured action spaces. In such settings, even representing the set of feasible actions at a single step may require a complex discrete optimization formulation. We leverage recent advances in embedding trained neural networks into optimization problems to propose SEQUOIA, an RL algorithm that directly optimizes for long-term reward over the feasible action space. Our approach embeds a Q-network into a mixed-integer program to select a combinatorial action in each timestep. Here, we focus on planning over restless bandits, a class of planning problems which capture many real-world examples of sequential decision making. RMAB, a broader class of restless bandits with combinatorial actions that cannot be decoupled across the arms of the restless bandit, requiring direct solving over the joint, exponentially large action space. Our approach significantly outperforms existing methods--which cannot address sequential planning and combinatorial selection simultaneously--by an average of 24.8% on these difficult instances. Reinforcement learning (RL) has made tremendous progress in recent years to solve a wide range of practical problems (Treloar et al., 2020; Marot et al., 2021; Silvestro et al., 2022; Degrave et al., 2022). While successful at dealing with large or infinite state spaces, RL struggles with discrete, combinatorial action spaces. This limitation is pertinent to many real-world sequential decisionmaking problems, where resource constraints frequently lead to combinatorial action spaces (Dulac-Arnold et al., 2020). Consider, for example, a sequential resource allocation problem in which public health workers are dispatched to visit patients. The workers each have a limited daily budget to maximize patient well-being. These requirements give rise to an exponentially large combinatorial action space to optimize over, even when the number of workers and patients is small.

constraint, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2503.01919

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Public Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing

Neural Information Processing SystemsFeb-11-2025, 20:54:36 GMT

Summary and Contributions: - New reinforcement learning algorithm to solve capacitated vehicle routing problem. However, there are some observations for the Machine Learning community that are of some interest. There is an enduring interest in the reinforcement learning community to investigate ways in which reinforcement learning technologies can play a role in hard combinatorial optimisation settings. Here, following the cited 2018 NeurIPS publication by Nazari et al., the authors of the submitted manuscript develop and evaluate a novel reinforcement learning approach for the capacitated vehicle routing problem (CVRP). The CVRP is a hard combinatorial problem class that includes the Travelling Sales Person problem.

combinatorial action, reinforcement learning, vehicle routing, (7 more...)

Neural Information Processing Systems

Industry: Transportation > Freight & Logistics Services (0.99)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing

Neural Information Processing SystemsFeb-11-2025, 20:54:28 GMT

The paper proposes a novel reinforcement learning approach to solving the capacitated vehicle routing problem. It involves learning a value function and solving a TSP for the prizing problem. Reviewers agree that the proposed approach is novel and interesting. One reviewer is sceptical of the work because of doubts about the performance achievable with the proposed approach. However, the ideas presented still deserve to be presented at NeurIPS, with the hope of bringing advances to this research area.

combinatorial action, reinforcement learning, vehicle routing, (2 more...)

Neural Information Processing Systems

Industry: Transportation > Freight & Logistics Services (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing

Neural Information Processing SystemsJan-21-2025, 06:52:54 GMT

combinatorial action, reinforcement learning, vehicle routing, (3 more...)

Neural Information Processing Systems

Industry: Transportation > Freight & Logistics Services (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Combinatorial Neural Bandits

Hwang, Taehyun, Chai, Kyuwook, Oh, Min-hwan

arXiv.org Artificial IntelligenceMay-31-2023

We consider a contextual combinatorial bandit problem where in each round a learning agent selects a subset of arms and receives feedback on the selected arms according to their scores. The score of an arm is an unknown function of the arm's feature. Approximating this unknown score function with deep neural networks, we propose algorithms: Combinatorial Neural UCB ($\texttt{CN-UCB}$) and Combinatorial Neural Thompson Sampling ($\texttt{CN-TS}$). We prove that $\texttt{CN-UCB}$ achieves $\tilde{\mathcal{O}}(\tilde{d} \sqrt{T})$ or $\tilde{\mathcal{O}}(\sqrt{\tilde{d} T K})$ regret, where $\tilde{d}$ is the effective dimension of a neural tangent kernel matrix, $K$ is the size of a subset of arms, and $T$ is the time horizon. For $\texttt{CN-TS}$, we adapt an optimistic sampling technique to ensure the optimism of the sampled combinatorial action, achieving a worst-case (frequentist) regret of $\tilde{\mathcal{O}}(\tilde{d} \sqrt{TK})$. To the best of our knowledge, these are the first combinatorial neural bandit algorithms with regret performance guarantees. In particular, $\texttt{CN-TS}$ is the first Thompson sampling algorithm with the worst-case regret guarantees for the general contextual combinatorial bandit problem. The numerical experiments demonstrate the superior performances of our proposed algorithms.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2306.00242

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > New York (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback